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Summary. 

Fitting regression models for intensity functions of spatial point processes is of great inte 
ecological and epidemiological studies of association between spatially referenced even 
geographical or environmental covariates. When Cox or cluster process models are used 
commodate clustering not accounted for by the available covariates, likelihood based infe 
becomes computationally cumbersome due to the complicated nature of the likelihooc 
tion and the associated score function. It is therefore of interest to consider alternative 
easily computable estimating functions. We derive the optimal estimating function in a 
of first-order estimating functions. The optimal estimating function depends on the solu 
a certain Fredholm integral equation which in practice is solved numerically. The appro) 
solution is equivalent to a quasi-likelihood for binary spatial data and we therefore use th< 
quasi-likelihood for our optimal estimating function approach. We demonstrate in a simi 
study and a data example that our quasi-likelihood method for spatial point processes i 
statistically and computationally efficient. 

Keywords: Estimating function, Fredholm integral equation, Godambe information, Inl 
function, Quasi-likelihood, Regression model, Spatial point process. 



1. INTRODUCTION 

In many applications of spatial point processes it is of interest to fit a regression moc 
the intensity function. In case of a Poisson point process, maximum likelihood estimat 
regression parameters is rather straightforward with a user-friendly implementation 
able in the R package spatstat. However, if Cox or cluster point process models are 
to accommodate clustering not explained by a Poisson process, then maximum likel 
estimation is in general difficult from a computational point of view (see e.g. M0lle 
Waagepetersen, 2004). Alternati vely, one may follow composite likelihood arguments 
Moller and WaagepetersenL l2007t) to obtain an estimating function that is equivalent 1 



score of the Poisson likelihood function. This provides a computationally tractable est 
ing function and theor e tical properties of the resu lting estimator are we ll understoO' 
e.g. ISchoenbersd (|2005h . IWaagepetersenl (|2007t) and iGuan and Lohl (|2007t ). 
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A drawback of the Poisson score function approach is the loss of efficiency since post 
depe ndence between points is ignored . In t he context of intensity estimation, it appears 
onlv lMrkvicka and Molchanov ( 2005 ) and Guan and Shenl (|2010h have tried to incorpo 
second -order properties in the estimation so as to improve efficiency. iMrkvicka and Mole 
(2005) show that their proposed estimator is optimal among a class of linear, unbiaset 
tensity estimators, where the word 'optimal' refers to minimum variance. However, t 
approach is confined to a very restric tive type of intensity f unction known up to a 
dimensional scaling factor. In contrast, Guan and Shenl ( 2010l ) propose a weighted estii 
ing equation approach that is applicable to intensity functions in more general forms 
similar optimality result can on the other hand not be established for their approach. 



In this paper we derive an optimal estimating function that not only takes into acc< 
possible spatial correlation but also is applicable for point processes with a general regres 
model for the intensity function. In the spirit of generalized linear models the intensi 
given by a differentiable function of a linear predictor depending on spatial covarii 
The optimal estimating function depends on the solution of a certain Fredholm int€ 
equation and reduces to the likelihoo d score in case of a Poisson pro cess. We sho 1 
Section 3.2 that the optimality result in Mrkvicka and Molchanovi (120051) is a spec ial ca: 
our more general result, and that the estimation method in lGuan and Shenl (|2010j ) is or 
crude approximation of our new approach. Apart from being computationally efficient, 
estimating function only requires specification of the intensity function and the so-c; 
pair correlation function, which is another advantage compared with maximum likelil 
estimation. 



For many types of correlated data other than spatial point patterns, estimating funct 
have been widely used for model fitting when maximum likelihood estim ation is comp 
tionally challenging. Ex amples of su ch data include longitudinal data (|Liang and Zt 
Il986l) . time series data (IZegerl 1988), clustered failure time data (iGrav , 20031) and 
tial binary or count data (|Gotwav and Stroud . 1997 ; Lin and Clayton . 20051) . For mof 
these methods, the inverse of a covariance matrix is used in their formulations as a wa 
account for the correlation in data, and optimal i ty can be established when the so-c< 
quasi-score estimating functions are used (|Hevdd . Il997l) . For point processes there is n 
direct analogue of a spatial covariance matrix, but it turns out that a numerical implcr 
tation of our method is clo sely related to the quasi-like lihood for spatial data considere 
Gotwav and Stroup ( 19971 ) and lLin and Clavton ( 2005 ). Our work hence not only lays 
theoretical foundation for optimal intensity estimation, but also fills in a critical gap 
tween existing literature on spatial point processes and the well-established quasi-likclil 
estimation method. We therefore adopt the term quasi-likelihood for our approach. 



Following some background material on point processes and estimating functions 
derive our optimal estimating function and discuss the practical implementation of it b. 
on a numerical solution of the Fredholm integral equation. Asymptotic properties of 
resulting parameter estimator is then considered and the superior performance of the qi 
likelihood method compared with existing ones is demonstrated through a simulation st 
We finally illustrate the practical use of the quasi-likelihood in a data example of t 
tropical tree species. 



2. BACKGROUND 
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In this section we provide background on first- and second-order moments of spatial 
processes, composite likelihood estimation and estimating functions. Throughout th< 
sentation, we use E, Var and Cov to denote expectation, variance and covariance, rt 
tively. 



2. 1. Intensity and Pair Correlation Function 

Let X be a point process on R 2 and let N(B) denote the number of points in In 
any bounded (Borel) set B CM 2 . We assume that X has an intensity function A(-) 
pair correlation function <?(-, •), whereby the first- and second-order moments of the c 
N(B) are given by 

EN(B) = [ A(u)du 
J B 

Cov[N(A),N(B)}= [ A(u)du+ f [ A(u)A(v)[ 3 (u, v) - l]dudv 

JAnB J A JB 



and 



I AC\B JAJB 

for bounded sets A, B C R 2 (jMoller and WaaeepetersenL l2004) . 



For convenience of exposition we assume that g(u,v) only depends on the diffe 
u — v since this is the common assumption in practice. In the following we thus lei 
denote the pair correlation function for two points u and v with u — v = r. Hot 
our proposed optimal estimating function is applicable also in the case of a non-trans 
invariant pair correlation function. 



2.2. Composite Likelihood 

Assume that the intensity function is given in terms of a parametric model A(u) = A( 
where (3 = (/3i, . . . , (3 p ) € l p is a vector of regression parameters. Popular choices < 
parametric model include linear and log linear models, A(u; (3) — z(u)/3 T and logA(u; 
z(u)/3 T , where z(u) = (zi(u), . . . , z v (u)) is a coyariate vector for each u£l 2 . A first- 
log composite likelihood function ( Schoenberei 2005 ; Waagepetersen , 2007 ) for estin. 
of (3 is given by 



logA(u;/3)- f A(u;/3)du 



(e.g. 



, m, where the ce 



Moller and Waag 



uexrw 

where W C K 2 is the observation window. This can be viewed as a limit of log com] 
likelihood functions for binary variables Yj, = l[N(Bi) > 0], i = 1 
form a disjoint partitioning of W and is an indicator function 
2007). The limit is obtained when the number of cells tends to infinity and the areas > 
cells tend to zero. In case of a Poisson process, the composite likelihood coincides wit 
likelihood function. 

The composite likelihood is computationally simple and enjoys considerable popu 
in particular in studies of tropical rain forest ecology where spatial point proces s mode 
fitted to huge spatial p oint pattern data sets of rain forest tree locations (see e.g. lShen < 
2009; Lin et al. . 2011 ). However, it is not statistically efficient for non-Poisson data 
possible correlations between counts of points are ignored. 
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2.3. Primer on Estimating Functions 

Referring to the previous Section l2~2l the composite likelihood estimator of f3 is obtainc 
maximizing the log composite likelihood Assuming that A is differentiable with res 
to (3 with gradient A'(u;/3) = dA(u; (3)/d/3, this is equivalent to solving the follow 



equation: 
where 



e(/3) = 0, 
A'(u;/3) 



■0> = E $tm-f A '( U ;/3)du 



uexnw 

is the gradient of ([3]) with respect to (3. Equations in the form of (Q} are typically refe 



to as estimating equations and functions like e(/3) are called estimating functions (|He 
Il997l) . Note that many other statistical estimation procedures, such as maximum likelif 
estimation, moment based estimation and minimum contrast estimation, can all be wri 
in terms of estimating functions. 

We defer rigorous asymptotic details to Section [5] and here just provide an info] 
overview of properties of an estimator (3 based on an estimating function e(/3). By a f 
order Taylor series expansion at (3, 

e(/3) n e@) + ft -0\S = 0-/3)3, 



where S = — Ede(/3)/d/3 T is the so-called sensitivity matrix (e.g. page 62 in ISond . l2007l) 



the equality is due to e(/3) = as required by (U. It then follows immediately that 
/3 + e(/3)S~ 1 . Thus, with (3 equal to the true parameter value, f3 is approximately unbi 
if Ee(/3) = 0, i.e. e(/3) is an unbiased estimating function. Moreover, Var/3 ss S _1 S 
where S = Vare(/3) and S _1 SS _1 is the asymptotic covariance matrix when the si2 
the data set goes to infinity in a suitable manner (Section[5|). The inverse ofS ~ 1 SS~ 1 
SS _1 S, is called the Godambe information (e.g. Definition 3.7 in lSonel 2007). 



Suppose that two competing estimating functions ei(/3) and e2(/3) with respective 
dambe informations Ii and I2 are used to obtain the estimators (3 1 and (3 2 - Then ei(/ 
said to be superior to e 2 ((3) if Ii — I2 is positive definite, since this essentially means 
j3 1 has a smaller asymptotic variance than (3 2 - If Ii — I2 is positive definite for all post 
e 2 (l3), then we say that ei(/3) has the maximal Godambe information and is an opt: 
estimating function. The resulting estimator (3 1 is then the asymptotically most efficic 



3. AN OPTIMAL FIRST-ORDER ESTIMATING EQUATION 

The estimating function given in ([5]) can be rewritten as 

e,(/9) = J2 f ( u )~ / f(u)A(u;/3)du, 

where f (u) = A'(u; /3)/A(u; (3). In general, f (u) can be any 1 xp real vector valued fund 
where p is the dimension of /3. We call (j6]) a first-order estimating function. Our aim : 
find a function <p so that is optimal within the class of first-order estimating function 
other words, the resulting estimator of (3 associated with e^, is asymptotically most effic: 
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Let S f = Vare f (/3), J f = -de f (/3)/d/3 T and S f = EJ f . Note that £ f , J f ai 
all depend on (3 but we suppress this dependence in this section for ease of present. 
Recalling the definition of optimality in Section 2.3, for to be optimal we must hav< 

S^S^/S^, — SfS f 1 Sf 

is non- negative definite for all f : W —> R p . A sufficient condition for this is 

S^f = Sf 

for all f where S f,/, = Cov[ef (/3), e^(/3)]. This type of condition is provided in Theore 



Hevdd (jl997l ) for discrete or continuous vector- valued data. In Appendix we 



w 



Hence 



short self-contained proof of the su fficiency of flHt in our setting . 

By the Campbell formulae (e.g. Moller and WaagepetersenL 2004L Chapter 4), 

S 0f =/ f T (u)0(u)A(u;/3)du+ f f T (u)0(v)A(u; /3)A(v; (3)[g(u - v) - l]dudv, 

Jw Jw 2 

f T (u)A'(u;/3)du. 

is equivalent to 

/ f T (u){A'(u;/3)-0(u)A(u;/3)-A(u;/3) f 0(v)A(v 5 /9)[ff(u- v) - l]dv}du = 
Jw Jw 

for all f : W — >• R p , which is true if 

A>;/3)-0(u)A(u;/3)-A(u;/3) f 0(v)A(v; (3)[g(u - v) - l]dv = 

Jw 

for all u £ W . Assuming A > , © implies that is a solution to the Fredholm in 
equation (e.g. Hackbuschl 19951 Chapter 3) 

= y - T0, 

where T is the operator given by 

(Tf)(u)= / t(u,v)f(v)dv with t(u,v)=A(v;^)[ 5 (u-v)-l]. 
iff 

Ass ume that q i s con tinuous so that T is compact in the space of continuous funi 
on W (jHackbuschl . Il995l Theorem 3.2.5) and moreover that —1 is not an eig envalu 
return to this condition in the next section). It then follows by Theorem 3.2.1 in lHack 
(1995) that (|10p has a unique solution 



0=(I- 



A 



where I is the identity operator (or, depending on context, the identity matrix) and (T 
is the bounded linear inverse of I + T. We define 

e(/3)=e 4t (j3)= ]T 0(u) - f 0(u)A(u; /3)du, 
uexnw Jw 

S = Vare(/3), J = -de(/3)/d/3 T , S = EJ 
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where by the above derivations, 

S = S= f T (u)A'(u;/3)du. 
Jw 

In the Poisson process case where g(-) — 1, (|T2"j) reduces to the Poisson likelihood score 
We develop a more explicit expression for (p by using Neumann series expansio 
Appendix [Bj The Neumann series expansion is also useful for checking the conditions 
our asymptotic results; see Appendix [U] However, it is not essential for our approach si 
omit the detailed discussion here. 



3. 1 . Condition for non-negative eigenvalues of T 

In general it is difficult to assess the eigenvalues of T given by (ITT|) . However, suppose 
g— 1 is non-negative definite so that T s is a positive operator (i.e., J w f T (u)(T s f )(u)du 
where T s is given by the symmetric kernel 

t s (u, v) = A(u; /3) x / 2 A(v; p) 1 ' 2 [g(u - v) - l] . 

Then all eigenvalues of T s are non- negative (|Laxl . 120021 Corollary 1, p. 320). In partici 
— 1 is not an eigenvalue. The same holds for T since it is easy to see that the eigenv8 
of T coincide with those of T s . 

The assumption of a non-negative definite g(-) — 1 is valid for the wide class of Cox p 
processes which in turn includes the class of Poisson cluster processes. For a Cox pre 
driven by a random intensity function A, g(u,v) = 1 + Cov[A(u), A(v)]/[A(u)A(v)] so 
g(-) — 1 is non- negative definite. 



3.2. Relation to Existing Methods 

Suppose we approximate the operator T by 



(Tf)(u)=/ f(v)A(v;/3)[s(u-v)-l)]dv« A(u;/3)f(u) f [g(u - v) - l]dv. 
Jw Jw 

This is justified if f(v)A(v;/3) is close to f(u)A(u;/3) for the v where g(u — v) — 1 di 
substantially from zero. Then the Fredholm integral equation (jlOl) can be approximate' 

cj) = — - XAcj), 



where 



A(u)= [ [ fl (u-v)-l]dv. 
Jw 



We hence obtain an approximate solution <fi = wX'/X with w(u) = [1 + A(u; /3)A(u] 
Using this approximation in (|12[) we obtain the estimating function 



uexnw 



V(u;/3) 
A(u;/3) 



;(u)A'(u;/3)du, 



ir 



which is precisely the weighted Poisson score suggested in Guan and Shen ( 2010h . 
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IMrkvicka and Molchanov (2005) derived optimal intensity estimators in the situat 
A(u; p) — P7(u) for some known function 7(11) and unknown parameter p > 0. 
the only unknown parameter, a direct application of (fTU| yields 



Sine 



p<p(u) 



w 



0(v)7(v) [g(u- 



l]dv 



which is essentially Corollary 3.1 of IMrkvicka and Molchanovl (|2005l ). It is uncomnw 
an intensity function to be known up to a one-dimensional scaling factor. In contras 
proposed modeling framework for the intensity function closely mimics that used in cla 
regression analysis an d is more general. As a result, ou r method of derivation is comp 
different from that in Mrkvicka and Molchanov (2005). 



4. IMPLEMENTATION 

In this section we discuss practical issues concerning the implementation of our pro 
optimal estimating function. In particular we show in Section [4.21 that a particular nu 
cal approximation of our optimal estimating function is equivalent to a quasi-likelihoi 
binary spatial data for which an iterative generalized least squares solution can be i 
mented. An R implementation will appear in future releases of spatstat. 



4. 1 . Numerical Approximation 

To estimate <j>, consider the numerical approximation 

(T0)(u) = / <(u,v)0(v)dv« Vt(u,u,)0(u l )w I , 

Jw j=1 

where U;,i = l,...,m, are quadrature points with associated weights Wi- Insertin; 
approximation in (JTUJ) with u = u; we obtain estimates <j£>(u;) of 0(u/), I — 1, . . . ,? 
solving the system of linear equations, 

rn A(u /3) 

Then (T<p)(u) as Y^iiL 1 ^( u > Ui)^^)^ an d plugging this further approximation into 
the Nystrom approximate solution of (pTTj) directly becomes 

0( u ) = X J U: ^\ - Yl *( u > u i)0( u i)^- 

A l u ;pj ~[ 

In (|12[) we replace <j> by <fi and we approximate the integral term applying again the qu 
ture rule used to obtain <fi. This leads to 



m 

e(/3) = 0(u)-^0( Ul )A(u 4 ;/3H. 

.,cVnW — 1 
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To estimate 3, we solve e(/3) = iteratively using Fisher scoring. Suppose that 
current estimate is d"'. Then 3" +1 ' is obtained by the Fisher scoring update 



3 



(W) 



3 {l> + e(3 (l) )S 



where 



m 

S = ^^( Ul ) T A'(u i; /3 (i) )^ 



is the numerical approximation of the sensitivity matrix S = J w T (u)A'(u; 3^)du. 

Provided the qua drature scheme is c onvergent, it follows by Lemma 4.7.4, Lemma 1 
and Theorem 4.7.7 in Hackbuschl ( 1995 ) that ||0— 4>\\oo converges to zero as m — > oo. ' 
justifies the use of the Nystrom method to obtain an approximate solution of the Fredl 
integral equation. 



4.2. Implementation as quasi-likelihood 

Suppose that we are using simple Riemann quadrature in (|15[) . Then the w^s corresp 
to areas of some sets Bi that partition W and for each i, Ui £ Bi. Let Yi denote the nun 
of points from X falling in Bi and define — A(ujj (3)wi. If the B.^s are sufficiently s: 
so that the YiS are binary then (fTTf is approximately equal to 

m 

^0(Ui)(li - jtii). 
i=l 

Further, by JTJ and ©, EF; ^ and 

Cov(Y i ,Y j ) = l(i = j) [ A(u;/3)du + / A(u; /3)A(v; /3) [ 5 (u - v) - ljdudv 

J Bi JBiXBj 

V^j- = = j) + jtii/ij- [g(\ii, uj) - l] . 

Define Y = (Y t ) t , n = (^)i and V = [Vij]ij. Then EY ps /x and CovY « V. Morec 
from (IT51) . [0(uj)]j = V _1 D where D = d/i, T /d/3 is the m x p matrix of partial derival 
dfii/d/3j. Hence, becomes 

(y-M)v- x D, 

which is fo rmally a quasi-likelihood s core for spatial data Y with mean /i. and covari; 



matrix V ([Gotwav and Stroud . 119971 ). 

Similarly, S in (fT9"]l becomes D T V _1 D and substituting e in (fT5)l by (f^Tj) . we obtain 
iterative generalized least squares equation 

(^(j+i) _ /3«)D(/3 ( ' ) ) T v( y 9 ( ' ) )- 1 D( y aW) = [y - ^(/gWjjvos^^^DOgW), 



where we have used the notation D(/3), V(/3) and fi(3) to emphasize the dependence o 
V, and u on 3. 
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4.3. Preliminary Estimation of Intensity and Pair Correlation 



Using the notation from Section B~2l V = YfJ (I + G)V /X / where V M = Diag(^i) an 

Gn = y/mvj[g(ui,Uj) - i] 

so that G = [Gij]ij is the matrix analogue of the symmetric operator T s from Sectio 
In general g is unknown and must be replaced by an estimate. Moreover it is advanta 
if G is fixed in order to avoid the computational burden of repeated matrix inversion : 
generalized least squares iterations (1221) . 

To estimate g we assume that g(r) = g(r;xp) where g(-;ip) is a translation invi 
parametric pair correlation function model. We replace ij) and 3 inside G by prelirr 
estimates 3 and i/> which are fixed during th e iterations (|2"2"|). The estimates 3 and i 
be obtained using the two-step approach in Waaeepetersen and Guar] ( 2009t) where 
obtained from the composite likelihood function and xp is a minimum contrast est 
based on the if-function. If translation invariance can not b e assumed, ■»/> may in ste 
estimated by using a second-order composite likelihood as in lJalilian et al.l (|2012l ). 



4.4. Tapering 

The matrix V can be of very high dimension. However, many entries in V are very cl< 
zero and we can therefo re approximate V by a sparse matrix V ta pcr obtained by tar. 
(e.g. iFurrer et all 120061) . More precisely, we replace G in V by a matrix Gtaper obtain 
assigning zero to entries Gij below a suitable threshold. We then compute a sparse n 
Cholesky decomposition, I + G ta per = LL T . Then (Y — /i)V^ 1 ^ 2 (I + G tapcr )~ 1 can be 
computed by solving the equation xLL T = (Y — /x)V pl 1//2 in terms of x using forwar 
back substitution for the sparse Cholesky factors L and L T , respectively. 

In practice, it is often assumed that g{r) — go(|| r ll) for some function g . If g< 
decreasing function of ||r|| then we may define the entries in G tap er as Gj_yT[||Uj — i 
rft a pcr], where <it a per solves [go(d) — l]/[go(0) — 1] = e for some small e. That is, we re 
entries Gij by zero if 5o(|| u i ~ u ill) — 1 is below some small percentage of the maximal 
3o(0)-l. 

When V in (|22|) is replaced by V tap cr we obtain the following estimate of the covai 
matrix of 3: 

s _1 n T v _1 v v 1 ds _1 

t a pcr t a pcr t a pcr t a pcr 

where St ap0 r = D^^V^^j.D. Note that it is not required to invert the non-sparse covai 
matrix V in order to compute (|23|. 



5. ASYMPTOTIC THEORY 



Let W n C M 2 be an increasing sequence of observation windows in R 2 . Following Secti< 
we assume that the true pair correlation function is given by a parametric model g 
g(r;xp) for some unknown parameter vector ip £ M. q . Let 8 = (f3,ip) e R p+<? . We d 
the true value of 9 by 8* = (/3*, ?/>*). In what follows, E and Var denote expectatio: 
variance under the distribution corresponding to 6* . 

Introducing the dependence on n and 6 in the notation from Section [31 we have 



ct> n Ju,3)= (I + T n>e 



v-iA'(-;/3) 



\(-.d) 



(u), (T„, e f)(u) 



/ 

/ i t; 



te(u, v)f(v)dv 
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and 

t e (u,v) = A(v;/3)[.g(u-v;-0)-l]. 

Following Section FOl we replace 6 in the kernel tg by a preliminary estimate n — (0 n , '< 
The estimating function (fT2"]) then becomes e n g (/3) where 

09)= <W u > 

uexnw„ • /w » 
Let /3„ denote the estimator obtained by solving e n g (J3) — 0. Further, define 

S„ = |W n r 1 Vare„ :e .(/3*), J„, e (/3) = --|Le„, e (/3) and S„, e (/3) = (Wg^EJ^Oe 

Note that S n and S„e(/3) are 'averaged' versions of £„ = Vare„e*(/3*) and S n g(f: 
EJ„,a(/9). 

In Appendix iDl we verify the existence of a iWn] 1 ^ 2 consistent sequence of solutions 
i.e., IWnJ 1 ' 2 ^,, — /3*) is bounded in probability. We further show in Appendix lEl 
W„| _1/ ' 2 e $ (/3*)S„ 1 ^ 2 is asymptotically standard normal. The conditions needed 



these results are listed in Appendix [C] It then follows by a Taylor series expansion, 

\w n \-^e . wK 1/2 = \Wn\ 1 ' 2 (&-/r)^~ (M *- l ' i 



\w n \ 

for some b„ E W satisfying ||b n - (3*\\ < \\f3 n - /3*||, and Eg] and Eg] in Appendix [D]t 
|Wn| 1 / a G9 n -)9*)S n , e .09*)I!; 1/a ->JV p (O,I). 

Hence, for a fixed n and since S„ = S„.e*(/3*) by (|13[) . (3 n is approximately normal " 
mean j3* and covariance matrix estimated by IW^I -1 ^ - ^ ^ ^{P n )- 

6. SIMULATION STUDY AND DATA EXAMPLE 

To examine the performance of our optimal intensity estimator relative to composite 
lihood and weighted composite likelihood, we carry out a simulation study under 
Guan and Shen setting. We use the quasi-likelihood implementation of our estim 



as described in Sections 14.2114.41 and hence use the term quasi-likelihood for our appro 
We refrain from a comparison with maximum likelihood estimation due to the lack of a c 
putationally feasible implementation of this method. In addition to the simulation si 
we demonstrate the practical usefulness of our method and discuss computational issu< 
a tropical rain forest data example. 



6.1. Simulation Study 

In the simulation study, following iGuan and Shenl (l201(lh . realizations of Cox proc< 
are generated on a square window W. Each simulation involves first the generation 
zero-mean Gaussian random field Z = {Z(u)} u£ w with exponential covariance func 
c(u) = exp(— ||u||/0.1) and then the generation of an inhomogeneous Thomas process g 
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Table 1. Reduction (%) in MSE (summed for f3 and /3i) for WCL and QL relative 
to CL. 







W = 


[0, l] 2 






W = 


[0,2] 2 






A* = 
WCL 


0.5 
QL 


Pt = 
WCL 


1.0 
QL 


Pi = 
WCL 


0.5 
QL 


ft = 
WCL 


1.0 
QL 


(100, 0.02) 


15.6 


35.9 


41.4 


59.3 


17.2 


39.7 


52.2 


68.5 


(100, 0.04) 


1.5 


34.4 


14.2 


42.2 


11.9 


38.9 


13.6 


55.1 


(200, 0.02) 


4.9 


15.4 


20.2 


34.0 


8.6 


19.9 


26.3 


40.0 


(200, 0.04) 


-3.5 


16.5 


3.0 


26.2 


2.0 


10.3 


-7.5 


18.0 



Z with intensity function A(u; (3) = cxp [/3q + /3iZ(u)] and clustering parameter i/> = ( 
cf. (|2U|) in Appendix [Bl For each simulation (3 = (/3q, /3i) is estimated using composite 
hood (CL), weighted composite likelihood (WCL), and quasi-likelihood (QL). The clusi 
parameter xj) is estimated using minimum contras t estimation based on the i^T-functioi 
Section 10.1 in lMoller and WaaeepetersenL 120041) . 

The simulation window is either W = [0,1] 2 or W — [0,2] 2 . The mean square 
(MSE) of the CL, WCL and QL estimates is computed using 1000 simulations for 
combination of different clustering levels (i.e., different expected numbers of clusters 
100 or 200 and different cluster radii oj* — 0.02 or 0.04), inhomogeneity levels (f3l — 0.5 
and expected number of points (400 in the case of W = [0, l] 2 and 1600 in the case of 
[0, 2] 2 ). The integral terms in the CL, WCL and QL estimating equations are approxu 
using a 50 x 50 grid for W = [0, l] 2 and a 100 x 100 grid for W = [0, 2] 2 . Taperh 
QL is carried out as described in Section 14.41 using c? ta por obtained with e = 0.01 for 
estimated pair correlation function g(-; xjj). For WCL we use ^4(u) sa K(d tapc ^ tp) — tt 
where 

K{t;ip) - / g(r;^)dr. 

J\\r\\<t 

Table 1 shows the reduction in MSE for the WCL and QL estimators relative to tl 
estimator. The reductions show that one can obtain more efficient estimates of the inti 
function by taking into account the correlation structure of the process. As expected 
the theoretical results, the QL estimator has superior performance compared with 
the CL and the WCL estimators in all cases. The improvement over the CL estima 
especially substantial in the more clustered (corresponding to small k* and to*) and 
inhomogeneous (corresponding to /3J — 1) cases where the largest reduction is 68.5/ 
we alluded in Section 3.2, the performance of the WCL estimator may rely on the va 
of the approximation (|14[) . In case of a longer dependence range, the approximat 
expected to be less accurate and this explains the large drop in the efficiency of the 
estimator relative to the CL estimator when u* increases from 0.02 to 0.04. In parti 
the WCL estimator does not appear to perform any better than the CL estimator 
ip* = (200,0.04). In contrast, the QL estimator still gives significant reductions in M 
size 10-26% depending on the value of (3^ and W . 



6.2. Data Example 

A fundamental problem in biological research is to understand the very high biodiv 
in tropical rain forests. One explanation is the niche assembly hypothesis, which i 
that different species coexist by adapting to different environmental niches. Data ava 
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for studying this hypothesis consist of point patterns of locations of trees as well as 
servations of environmental covariates. Figure Q] shows the spatial locations of three 
species, Acalypha diversifolia (528 trees), Lonchocarpus heptaphyllus (836 trees) and ( 
paris f rondosa (3299 trees ) , in a 1000m x 500m observation windo w on Barro Colo: 
Island (|Condit et all . Il996t IConditl . Il998t iHubbell and Fosterl [l983h . Also one examp: 



an environmental variable (potassium content in the soil) is shown. 

In order to study the niche assembly hypothesis we use our quasi-likelihood mel 
to fit log-linear regression models for the intensity functions depending on environme 
variables. In addition to soil potassium content (K, divided by 1000), we consider nine o 
covariates for the intensity functions: pH, elevation (dem), slope gradient (grad), m 
resolution index of valley bottom flatness (mrvbf ), incoming mean solar radiation (sol 
topographic wetness index (twi) as well as soil contents of copper (Cu), mineralized nitn 
(Nmin) and phosphorus (P). The quasi-likelihood estimation was implemented as in 
simulation study using a 100 x 50 grid for the numerical quadrature and tapering tu: 
parameter e = 0.01. 

For each s pecies we initia l ly fit the following pair correlation functions of normal vari; 



mixture type (jJalilian et all 120121 ): 



g(r;tp) = l + c(r;^), r e R 2 , 
where the covariance function c(r; i/>) is either Gaussian 

c(r; (aV))=a 2 exp[-(||r||/o0 2 ], 
Matern {K v is the modified Bessel function of the second kind) 

c(r; {a 2 ,a,v)) = a 

or Cauchy 



2 (||r||/a)^(||r||/a) 



2 I/ - 1 I» 



c(r; (a 2 ,a))=a 2 [l + (||r||/a) 2 r 3/2 . 

These covariance functions represent very different tail behavior ranging from light (G 
sian), exponential (Matern), to heavy tails (Cauchy). The pair correlation function obta 
with the Gaussian covariance function is just a re-parametrization of the Thomas pre 
pair correlation function (1261) . For the Matern covariance we consider three different 
ues of the shape parameter v = 0.25,0.5 and 1. With v = 0.5 the exponential m 
c[r; (<t 2 , a, 0.5)] = a 2 exp(— ||r||/a) is obtained while v = 0.25 and 1 yields respectively i 
convex and a log concave covariance function. 

Figure [2] shows c(-;rp) = g(-;xp) — 1 for the best fitting (in terms of the minimum 
trast criterion for the corresponding i^-function) pair correlation functions: Cauchj 
Acalypha (xp = (15.4,2.3)), Matern (i} = (2.2,15.5,0.5)) for Lonchocarpus and Ma 
(ip = (1.2,30.2,0.25)) for Capparis. The tapering distances corresponding to e = 0.01 
respectively 20.9, 71.3 and 112.2 for the three species. Hence Capparis is the computai 
ally most challenging case. 

Backward model selection with significance level 5% was carried out for each spe 
According to the quasi-likelihood results, potassium (K) is a significant covariate at 
5% level for Acalypha, mineralized nitrogen (Nmin) and phosphorous (P) are significan 
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Fig. 1. Locations of Acalypha, Lonchocarpus, and Capparis trees and image of interpolated | 
sium content in the surface soil (from top to bottom). 
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Fig. 2. Best fitting covariance functions c(-;V0 = s(-;^0 - 1 for Acalypha (left), Lonchoca 
(middle), and Capparis (right). 



Table 2. Computing times (T) in seconds (without computation of standard errors) and QL 
parameter estimates for different combinations of grid size and tapering. 



Grid e 


Acalypha 
T estm. 


Lonchocarpus 
T estm. 


Capparis 
T estm. 


0.05 

100x50 0.01 
.002 


0.3 -6.9 4.4 
0.4 -6.9 4.4 
0.6 -6.9 4.4 


1.1 -6.5 -0.028 -0.16 
2.6 -6.5 -0.028 -0.15 
4.4 -6.5 -0.028 -0.15 


2.4 -5.1 0.021 -2.4 4.2 

7.5 -5.1 0.020 -2.3 3.9 
12.7 -5.1 0.020 -2.3 3.8 


0.05 

150x75 0.01 
.002 


0.5 -6.9 4.3 
1.8 -6.9 4.3 
5.3 -6.9 4.3 


8.5 -6.5 -0.028 -0.16 
23.7 -6.5 -0.028 -0.15 
41.6 -6.5 -0.028 -0.15 


34.9 -5.1 0.021 -2.3 4.1 
80.4 -5.1 0.020 -2.2 3.8 
163.6 -5.1 0.020 -2.2 3.8 



Lonchocarpus while elevation (dem) , gradient (grad) and potassium are significant for ( 
paris. The fitted linear predictors with estimated standard errors in parenthesis are res 
tively -6.9+4.4K (0.085,1-2), -6.5-0.028Nmin-0.15P (0.088,0.0069,0.055) and -5.1+0.020 
2.3grad+3.9K (0.078,0.0090,0.98,1.0). 

The computing time for the QL estimation depends both on the grid used for 
numerical quadrature and the tapering tuning parameter e. We also tried out a 150 : 
grid and e = 0.05 and 0.02 for the QL fitting of the final models. Parameter estimates 
parameter estimation computing time (system plus CPU time on a 2.90 GHz lap top 
all combinations of grid sizes, e and species are shown in Table 2. The computing 1 
for the parameter estimation depends much on both grid sizes, e and species (i.e. ranf 
spatial dependence). Computing time including computation of standard errors is show 
Table 3, together with the computed standard errors for the parameter estimates in Tab 
The computing time with computation of standard errors is less sensitive to e and sp( 
since in this case the main computational burden arises from the non-sparse matrix in ( 
For the 100 x 50 grid and e = 0.01, the maximal computing time of 29.1 seconds (inclu 
computation of standard errors) occurs for Capparis. In contrast to large variations in 
computing time, the parameter estimates and estimated standard errors for each sp< 
are very stable across the combinations of grid sizes and tapering parameter e. 
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Table 3. Computing times (T) in seconds (including computation of standard errors) and esti 
standard errors of QL parameter estimates for different combinations of grid size and tapering 







Acalypha 




Lonchocarpus 




Capparis 


Grid 


e 


T 


sd. 


T 


sd. 


T 


sd. 




0.05 


12.1 


0.085 1.2 


22.4 


0.088 0.0069 0.055 


24.7 


0.078 0.0091 0.9 


100x50 


0.01 


12.0 


0.085 1.2 


24.0 


0.088 0.0069 0.055 


29.1 


0.078 0.0090 0.9 




.002 


12.1 


0.085 1.2 


25.9 


0.088 0.0069 0.055 


34.3 


0.078 0.0090 0.9 




0.05 


59.4 


0.079 1.1 


187.2 


0.087 0.0069 0.055 


223.4 


0.078 0.0090 0.9 


150x75 


0.01 


58.9 


0.079 1.1 


204.6 


0.087 0.0069 0.055 


255.2 


0.078 0.0089 0.9 




.002 


63.6 


0.079 1.1 


226.5 


0.087 0.0069 0.055 


300.9 


0.078 0.0089 0.9 



7. DISCUSSION 



In contrast to maximum likelihood estimation our quasi-likelihood estimation methoc 
requires the specification of the intensity function and a pair correlation function. Mori 
the estimation of the regression parameters can be expected to be quite robust t( 
misspecification of the pair correlation function since the resulting estimating equat 
unbiased for any choice of pair correlation function. In the data example we considerec 
correlation functions obtained from covariance functions of normal variance mixture 
Alter natively one might c onsider pair correlation functions of the log Gaussian Cox pi 
type ( Mailer et al. . 19981) . i.e., g(r) = exp[c(r)], where c(-) is an arbitrary covai 



function. 

If a log Gaussian Cox process is deemed appropriate, a computationally feasible 
native to our approach i s to use the meth od of integrated nested Laplace approxin 
(INLA, [1 ue et all l2009t llllian et all l2012f) to implement Bayesian inference. Howev 
order to apply INLA it is required that the Gaussian field can be approximated well 
Gaussian Markov random field and this can limit the choice of covarianc e function 
examp le, the accurate Gaussian Markov random field approximations in iLinderen 
( 20111 ) of Gaussian fields with Matern covariance functions are restricted to integer v : 
planar case. In contrast, our approach is not subject to such limitations and can a! 
applied to non-log Gaussian Cox processes. 

We finally note that for the Nystrom approximate solution of the Fredholm equati 
used the simplest possible quadrature scheme given by a Riemann sum for a fine grid, 
entails a minimum of assumptions regarding the integrand but at the expense of a typ 
high-dimensional covariance matrix V. There may hence be scope for further develor 
considering more sophisticated numerical quadrature schemes. 
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APPENDIX A. Condition for optimality 

To show that © implies non- negative definiteness of (J7]), let e<^(/3) = ef (/3)S i T 1 Sf0 be 
optimal linear predictor of e^(/3) given ef(/3). Then 

Var[e (/3) - e (/3)] = - S 0f £f 

is non-negative definite whereby 

S ( />S^ 1 S^ — S^S^S^fSj 1 Sf0S ( ^ ) 1 S0 

is non-negative definite too. Hence, (|7J| is non-negative definite provided 

S^S^E^f = Sf 

which follows from © (in particular, © implies S^, = = S^). 

APPENDIX B. SOLUTION USING NEUMANN SERIES EXPANSION 

Suppose that ||T|| op = sup{||Tf|| 00 /||f|| 00 : HfHoo ^ 0} < 1 where ||f ||oo denotes 
supremum norm of a continuous function f on W. Then we can obtain the solutio 
of (1101) using a Neumann series expansion which may provide additional insight on 
properties of <fr. More specifically, 

oo . / 

= 5>T)^. 

If the infinite sum in (j2~i]) is truncated to the first term (k — 0) then (|12p becomes 
Poisson score. Note that 



|T||oo < sup / |t(u,v)|dv. 



uew Jw 

Hence, a sufficient condition for the validity of the Neumann series expansion is 



supA(u;/3) / \g{v) - l|dr < 1. 
new Jr2 



Condition (|23|) roughly requires that g(r) — 1 does not decrease too slowly to zero am 
that A is moderate. For example, suppose that a is the pair correlation function of a Tho 



cluster process (e.g. iNfoller and Waagepetersenj 12004 Chapter 5) 



g(r) — 1 = exp [— ||r|| 2 /(4a; 2 )l /(Attuj 2 k), for some n,w > 0, 
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where k is the intensity of the parent process and uj is the normal dispersal parar 
Then, 

/ Ur)-l|dr= -r^—= f exp(-^%)dr = 1/k 

and (l25|) is equivalent to sup u£W A(u; (3) < n. In this case, Condition ((25)) can be 
restrictive. However, the Neumann series expansion is not essential for our approac. 
we use it only for checking the conditions for asymptotic results; see Appendix [Cl 

APPENDIX C. CONDITIONS AND LEMMAS 

To verify the existence of a | Wn] 1 ^ 2 consistent sequence of solutions /3 n , we assume th: 
following conditions are satisfied: 

Cl A(u;/3) = A(z(u)/3 T ) where A(-) > is twice continuously differentiable and 

sup ugR2 ||z(u)|| < Ki for some K\ < oo. 
C2 for some < K% < oo, f R2 \g(r;ip*) — l|dr < K 2 - 

C3 4> n g (u,(3) is differentiable with respect to 9 and /3, and for \4> n g (u, /3)|, |d</>„ e (u 
and |d</>„ e (u,/3)/d0|, the supremum over u e M 2 ,/3 e b((3*,K 3 ),0 € 6(0*, i 
bounded for some > 0, where 6(x, r) denotes the ball centered at x with i 
r > 0. 

C4 |W n | 1 / 2 (0 n - 0*) is bounded in probability. 

C5 I = liminf„ l n > 0, where for each n, Z„ denotes the minimal eigenvalue of 

Sn,fl'G8*) = l^r^J,^.^*) = \Wn\~ 1 f ne -(u) T A'(u;/3*)du. 

Condition QTjand CE] imply iTfjand IT5] below. 

LI for A(u;/3), A'(u;/3) and A"(u;/3), the supremum over u € R 2 ,/3 € b(f3*,K 3 

b(6*,K3) is bounded. 
L2 for a function /i : K 2 ->• K, 

Var V /i(u) < |W„|[1+ sup A(u;/3*)X 2 1 sup h(uf sup A(u;/3*). 

In particular, |W n | _1 Var Y^uexnw M u ) i s bounded when h is bounded. 

The condition C[3] is not so easy to verify in general due to the abstract nature < 
function <p n g . However, it can be verified e.g. assuming that <p n $ can be expressed usii 
Neum ann series. Condition CQ]holds under conditions specified in lWaagepetersen and 
(|2009h (including e.g. CTJand (©. Condition Q5] is not unreasonable since 



8m.(T) - 1^1-/^ [|^] T [<' + n, 8 -)-'^^](u)du 

and (I + e «) _1 is a positive operator (see Section [XT]) . Since S„ = S n) e*(/3*), (J 
implies 

L3 I — liminf„ l n > where for each n, l n denotes the minimal eigenvalue of S n . 
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To prove the asymptotic normality of |W / n |~ 1 / 2 e n e (/3*)S„ 1 ^ 2 , we assume that 
following additional conditions are satisfied: 

Nl W n — nA where A C (0, 1] x (0, 1] is the interior of a simple closed curve ' 
nonempty interior. 

N2 sup p "(P'k- 1 = 0(k~ e ) for some e > 2, where a(p;k) is the strong mixing coeffic 
( Rosenblattl . fl9561 . For eachp and fc, the mixing condition measures the depend' 



between X fl E\ and X n E% where E\ and Ei are arbitrary Borel subsets of M 2 
of volume less than p and at distance fc apart. 
N3 for some K4 < 00 and k = 3, 4, 

sup / •••/ |Qfc(ui,--- ,u fc )|du 2 ---dufc <Ki, 

uiGR 2 JR 2 Jm 2 



where Qk is the fc-th order cumulant density function of X fe.g. lGuan and Lohl . l2C 



Condi tions N1-N3 correspond to conditions (2), (3) and (6), respectively, in lGuan and 
(|2007l) . See this paper for a discussion of the conditions. 

APPENDIX D. EXISTENCE OF A \W n \^ 2 CONSISTENT f) n 

We use Theorem 2 and Remark 1 in [Waagepetersen and Guan (2009) to show the exist- 



of a iWn] 1 / 2 consistent sequence of solutions /3„. Let ||A||a/ = sup^ \<Hj\ for a mf 
A = [a,ij]ij. With V„ = |W„| 1 / 2 Sy 2 we need to verify the following results: 



Rl UV-iM^O. 
R2 For any d > 0, 



sup IIV-^J^C^-J^gjr^V^IlM 

/3:||(/3-/3*)V„||<d 



converges to zero in probability. 
R3 ||J n § (fl*)/\W n \ — Sn,0*(P*)\\M converges to zero in probability. 

e n (Z^*)^^ 1 * s bounded in probability. 
R5 lim inf n l n > where 

l n = inf xS; 1/2 S n>e .(^*)E; 1/2 x T . 

11*11 = 1 

We now demonstrate that B[1}B[3] hold under the conditions OJjClS] listed in Appendi: 
For each of the results below the required conditions or previous results are indicate 
square brackets. 

R[T] L[T]-L[5] : By 031 HTJ and L[5] the entries in £„ are bounded from below and ab 
Moreover, by L[3]the determinant of S„ is bounded below by l p > 0. 

BE] [BE CH HU I4H CSJ: We show that 

sup ll|W n r 1 [J n ,eG9)- J n ,o'(P*)]\\M 

re.3i:iire-e*.a-a*iiw„iV2ii<d 
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converges to zero in probability. Note 

\W n \- l 3 n AP) = + M„, e (/3) 

where 

L„, e (/3) = - V fi,„, e (u,/3) and M„. e ((3) = [ f 2 ,„,e(u,/3) 

with 

fi,n,e(u,/3) = — — f 0n,e( u i W 

and 

f 2 ,„,e(u,/3) = [A(u;/3)^ T w , e (u,/3) + \' (u; (3) T ^(u, (3)] . 

Define 

^i,n(u) = sup |fi. n ,(»(u,/3) - fj. n , e «(u,/3*)|, i = 1,2 

(0,/3): e*,y3— | W„ I 1 / 2 1| <d 

and note that /ii „(u) converge to zero asn-> oo. Then 

sup |M„, (/3)-M„, e .(/3*)| < / /ii,„(u)du 

(e,/3) : ||(e-e*,/3-/3*)|iy„| 1 /2||< ( i Jr2 

where the right hand side converges to zero by dominated convergence. Moreover, 

sup |L»,e09) -L n , e »(/3*)| < V h 2 .n(u) < 

(ej3) : ||(e-£r,/3H3*)|w- n |V2||<<i u£X 



^ /l 2 ,n(u) - E ^ h 2 ,n(u) + IE J] ^2,n0 



The first term on the right hand side converges to zero in probability by Chebys 
inequality and the second term converges to zero by dominated convergence. 

Iffl H2 C0]: 

l^r 1 J n A(/3*)-S n (/3*) = 

|W»r 1 [J»,8 n 08*)-Jn,fl«03*)] + [|W n |- 1 J„ lfl .03*)-Sn03 

It follows from the proof of F(2] that the first term on the right hand side converges t< 
in probability. The last term converges to zero in probability by Chebyshev's inequal 

RH [CEl Iffl HI CU: Since Vare„ je * (^V,; 1 is the identity matrix, e n , e »(/3*)V 
bounded in probability by Chebyshev's inequality. The result then follows by sh 
that 

l^nl -1 ^ 2 [e n § {(3*) — e n< g* Q3*)] converges to zero in probability. Let 
f„(0) = \W n \- l ^e n , e {F) = 

W X [ ^ -^T<PnA",P*)- [ A(u;/3*)-^0 n , e (u,/3*)d 



uexrw, 
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Then 

\Wn\- 1/2 [e nt e n (P*)-en,o*(P*)] = \W n \^ 2 {e n - 0* )f„(t„) 

where ||t„ - 0*\\ < \\8 n - 0*|| and the factor \W n \ 1/2 (6 n -6*) is bounded in probab: 
Further, 

f n (t n ) = f n (t n )-f„(0*) + f„(0*) 



where i n {6*) converges to zero in probability by Chebyshev's inequality and f„(t„) — f n 
converges to zero in probability along the lines of the proof of 

R[5] [CU IJ3j: Follows directly from CD and L|3J 

APPENDIX E. ASYMPTOTIC NORMALITY OF |W„| ~ 1/2 e n ~ e 

By the proof of R[4]it suffices to show that | W n \~ 1 ^ 2 e n g* 1//2 is asym ptotically 



mal. To do so we use the blocking technique used in iGuan and Lohl ( 20071 ). Specific 
Condition implies that there is a sequence of windows W% = U^W* given for each ■ 
a union of m„ x m n sub squares W^, i = 1, • • • , fc„, such that \W£ \/\W n \ — > 1, m„ = 0< 
and the inter-distance between any two neighboring sub squares is of order n v for s 
4/(2 + e) < < a < 1. Let 



where 



Define 



ueinws W " i=l 



e n,e*(/ 3 )= E ^n,s-( u ;/3)- / </w(u;/3)A(u;/3)du. 



uexnw, 



e^(/3)=£e^.(/3), 

where the e^g»(/3)'s are independent and for each i and n, e^'g,(f3) is distribute! 

<$.(/3). Let sf = |^^are£ e ,(/3*) and sf = I^Vare^ (/3*). We nee 
verify the following results: 



SI 
S2 

S3 



~ B — S — B 

|S„ - S„||m and ||S„ - S n ||ju ^Oasn^oo, 

Wn \~ 1 ^ 2 Vn,8'(P*) (S„ J is asymptotically standard normal, 

/ _ £-1-1/2 

W^rf l~ e n6»*(/3*) ( ) nas the same asymptotic distribution as 
W^|- l / 2 ^ ifl .09*)(^)" 1/a , 

\\W£ r 1/2 ef. e .(/3*) - |Wnr 1/2 e„,fl-G9*)ll converges to zero in probability. 



S4 

91 [CEl CEl JNQ]: This follows from the proof of Theorem 2 in Guan and Loh (2007). 

92 [CU CM 1S01: Conditions CU CUand pimply E[el fl »(/3) 4 l is bounded (see the p 



Quasi-likelihood for Spatial Point Processes 



of Lemma 1 in lGuan and Lohl . 120071 ). Thus, S2 follows from an application of Lyapi 
central limit theorem. 



331 this follows by bounding the difference between t he characteristic functi c 

\W^\~ 1/2 e^ g ,((3*) and |W^|" 1/2 ef e , ((3*) using techniq ues inllbramigov and Linnikl I 
and secondly applying the mixing condition Is(21 see also iGuan et al.l (|200^K 



31 [audi en rcj: 

show Var[e nj 6i» (/3*) - 

\wz\/\w n \ -4 1. 



Recall that |W,f |/|W„| 



— s- 1 due to r\(TJ By Cj5] we only ne 
This is implied by conditions QJ}Qj 



